Skip to content

get rid of some false negatives in rustdoc::broken_intra_doc_links #132748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

lolbinarycat
Copy link
Contributor

@lolbinarycat lolbinarycat commented Nov 7, 2024

rustdoc will not try to do intra-doc linking if the "path" of a link looks too much like a "real url".

however, only inline links ([text](url)) can actually contain a url, other types of links (reference links, shortcut links) contain a reference which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should not be skipped due to looking like a url.

fixes #54191

to minimize the number of false positives that will be introduced, the following heuristic is used:

If there's no backticks, be lenient revert to old behavior.
This is to prevent churn by linting on stuff that isn't meant to be a link.
only shortcut links have simple enough syntax that they
are likely to be written accidentlly, collapsed and reference links
need 4 metachars, and reference links will not usually use
backticks in the reference name.
therefore, only shortcut syntax gets the lenient behavior.
here's a truth table for how link kinds that cannot be urls are handled:

is shortcut link not shortcut link
has backtick never ignore never ignore
no backtick ignore if url-like never ignore

@rustbot
Copy link
Collaborator

rustbot commented Nov 7, 2024

r? @notriddle

rustbot has assigned @notriddle.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. labels Nov 7, 2024
@lolbinarycat lolbinarycat force-pushed the rustdoc-intra-doc-link-warn-more-54191 branch from df3d0f6 to 8ae6586 Compare November 7, 2024 21:41
@rust-log-analyzer

This comment has been minimized.

Copy link
Contributor

@notriddle notriddle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good idea, but I have found one problem. Consider a sample like this:

//@ check-pass
#![no_std]
#![deny(rustdoc::broken_intra_doc_links)]

// regression test for https://github.com/rust-lang/rust/issues/54191
// false positives

//! This is not an intra-doc link: [`foobar`]
//!
//! [`foobar`]: /index.html

That test case fails, because it thinks /index.html is an intra-doc link. You're right that certain kinds of links aren't allowed to be URLs—that's the Unknown type links.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 7, 2024
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lolbinarycat
Copy link
Contributor Author

welp, turns out the standard library itself contains several of these false negatives!

@lolbinarycat
Copy link
Contributor Author

...or perhaps there's some deeper bug with intra-doc link generation?

there seems to be some false positives containing spaces, even though there is a matching reference definition...

I guess we can just re-enable unconditionally ignoring links with spaces?

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@lolbinarycat
Copy link
Contributor Author

Ok, this breaks a lot of ui tests. This might actually have a pretty big impact, maybe we should do a crater run or something..?

@lolbinarycat lolbinarycat added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Nov 8, 2024
@notriddle
Copy link
Contributor

It's looks fine to me. Here's the PRs where each of these test cases were added:

test case PR
disambiguator-endswith-named-suffix.rs #127016
weird-syntax.rs #112014
redundant_explicit_links-utf8.rs #115070

The last two are both contrived ICE tests, and all of them are intended to be parsed as intra-doc links. Making it so that they warn seems fine to me.

r? @GuillaumeGomez do you also think this is a suitable bug fix?

@rustbot rustbot assigned GuillaumeGomez and unassigned notriddle Nov 8, 2024
@notriddle notriddle self-assigned this Nov 8, 2024
@GuillaumeGomez
Copy link
Member

I think so. I'm not completely satisfied with the current impact though. Intra-doc links should not be triggered on items that contain characters which can't be in an ident or a path, like Clone\(\). Maybe a different warning in these cases?

@lolbinarycat
Copy link
Contributor Author

@GuillaumeGomez what about just amending the current warning to account for other common mistakes?

honestly makes me wish we had --explain for lints, putting a full help message for each one could get a bit verbose.

regardless, i think that should be a separate issue, since the lint output isn't the most helpful in general (it just recommends escaping the brackets, and doesn't mention other common issues, such as misspelled link references)

@GuillaumeGomez
Copy link
Member

what about just amending the current warning to account for other common mistakes?

Do you have an example of before/after of what you have in mind by any chance?

@lolbinarycat
Copy link
Contributor Author

Do you have an example of before/after of what you have in mind by any chance?

well, the most obvious one is some sort of "consider adding a reference: [whatever]: //some-url.example/", but i also think trying to catch typos by comparing the edit distance of the missing reference to that of existing references could be handy.

and perhaps we should also mention surrounding code by backticks, since code snippets should generally be in code blocks instead of individually escaping chars.

@GuillaumeGomez
Copy link
Member

well, the most obvious one is some sort of "consider adding a reference: [whatever]: //some-url.example/", but i also think trying to catch typos by comparing the edit distance of the missing reference to that of existing references could be handy.

Sounds good to me. The distance between two items could be nice too (as a follow-up?).

@lolbinarycat lolbinarycat added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 22, 2025
@lolbinarycat
Copy link
Contributor Author

I'm quite happy with the current state of this, and think that the vast majority of new warnings created by it will be genuine mistakes. Just waiting on code review now.

Comment on lines +999 to +1018
// |-------------------------------------------------------|
// | | is shortcut link | not shortcut link |
// |--------------|--------------------|-------------------|
// | has backtick | never ignore | never ignore |
// | no backtick | ignore if url-like | never ignore |
// |-------------------------------------------------------|
let ignore_urllike =
can_be_url || (ori_link.kind == LinkType::ShortcutUnknown && !ori_link.link.contains('`'));
if ignore_urllike && should_ignore_link(path_str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the complete truth table is:

unknown shortcut link unknown reference / collapsed link other link
has backtick never ignore never ignore ignore if url-like
no backtick ignore if url-like never ignore ignore if url-like

Assuming I'm understanding the logic correctly (the difference is to things like [first](second)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. Technically I do explicitly define the scope of the existing table with the line "here's a truth table for how link kinds that cannot be urls are handled:"

the line is currently 66 chars long, i should be able to fit in the other case without upsetting tidy (100 char limit), do you want me to?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I should've read it more closely.

Comment on lines +999 to +1018
// |-------------------------------------------------------|
// | | is shortcut link | not shortcut link |
// |--------------|--------------------|-------------------|
// | has backtick | never ignore | never ignore |
// | no backtick | ignore if url-like | never ignore |
// |-------------------------------------------------------|
let ignore_urllike =
can_be_url || (ori_link.kind == LinkType::ShortcutUnknown && !ori_link.link.contains('`'));
if ignore_urllike && should_ignore_link(path_str) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I should've read it more closely.

@notriddle
Copy link
Contributor

@rfcbot poll is this a good heuristic for plain [shortcut] links?

@rfcbot
Copy link
Collaborator

rfcbot commented May 23, 2025

Team member @notriddle has asked teams: T-rustdoc, for consensus on:

is this a good heuristic for plain [shortcut] links?

@lolbinarycat lolbinarycat added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 27, 2025
Copy link
Member

@fmease fmease left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please squash some of the micro/fixup commits? Thanks!

@fmease fmease added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Jul 15, 2025
@fmease
Copy link
Member

fmease commented Jul 15, 2025

I think team consensus has now been reached on the poll (majority / two outstanding). Relabeled to S-waiting-on-review (but I've also left some minor nitpicks, didn't do a full review).

@rustbot

This comment has been minimized.

lolbinarycat and others added 13 commits July 19, 2025 14:17
rustdoc will not try to do intra-doc linking if the "path"
of a link looks too much like a "real url".

however, only inline links ([text](url)) can actually contain
a url, other types of links (reference links, shortcut links)
contain a *reference* which is later resolved to an actual url.

the "path" in this case cannot be a url, and therefore it should
not be skipped due to looking like a url.

fixes rust-lang#54191
fix link kind check

Co-authored-by: Michael Howell <michael@notriddle.com>
add error annotation

Co-authored-by: Michael Howell <michael@notriddle.com>
this is in an effort to reduce the amount of code churn caused by
this lint triggering on text that was never meant to be a link.

a more principled hierustic for ignoring lints is not possible
without extensive changes, due to the lint emitting code
being so far away from the link collecting code,
and the fact that only the link collecting code
has access to details about how the link appears in the
unnormalized markdown.
collapsed links and reference links have a pretty particular syntax,
it seems unlikely they would show up on accident.
Co-authored-by: León Orell Valerian Liehr <me@fmease.dev>
@lolbinarycat lolbinarycat force-pushed the rustdoc-intra-doc-link-warn-more-54191 branch from 6afae85 to 4bc4a5b Compare July 19, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-rustdoc-json Area: Rustdoc JSON backend S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

rustdoc does not warn about broken links if they contain . or []
8 participants